Coalescent experiments I: Unlabeled n-coalescent and the site frequency spectrum
نویسندگان
چکیده
We derive the transition structure of a Markovian lumping of Kingman’s n-coalescent [1, 2]. Lumping a Markov chain is meant in the sense of [3, def. 6.3.1]. The lumped Markov process, referred as the unlabeled n-coalescent, is a continuous-time Markov chain on the set of all integer partitions of the sample size n. We derive the backwardtransition, forward-transition, state-specific, and sequence-specific probabilities of this chain. We show that the likelihood of any given sitefrequency-spectrum (SFS), a commonly used statistics in genome scans, from a locus free of intra-locus recombination, can be directly obtained by integrating conditional realizations of the unlabeled n-coalescent. We develop a controlled Markov chain for importance sampling such integrals from an augmented unlabeled n-coalescent forward in time. We apply the methods to population-genetic data to conduct demographic inference at the empirical resolution of the site-frequency-spectra. We also extend a family of classical hypothesis tests of standard neutrality at a non-recombining locus based on any statistics of the SFS to a more powerful version that conditions on the topological information contained in the SFS. We formalize a graph of coalescent experiments to set a decision-theoretic stage for population genetic inference across different empirical resolutions. keywords. partially ordered n-coalescent experiments graph; controlled Markov chain for importance sampling
منابع مشابه
Statistical properties of the site-frequency spectrum associated with lambda-coalescents.
Statistical properties of the site-frequency spectrum associated with Λ-coalescents are our objects of study. In particular, we derive recursions for the expected value, variance, and covariance of the spectrum, extending earlier results of Fu (1995) for the classical Kingman coalescent. Estimating coalescent parameters introduced by certain Λ-coalescents for data sets too large for full-likeli...
متن کاملThe matrix coalescent and an application to human single-nucleotide polymorphisms.
The "matrix coalescent" is a reformulation of the familiar coalescent process of population genetics. It ignores the topology of the gene tree and treats the coalescent as a Markov process describing the decay in the number of ancestors of a sample of genes as one proceeds backward in time. The matrix formulation of this process is convenient when the population changes in size, because such ch...
متن کاملThe site-frequency spectrum associated with Ξ-coalescents.
We give recursions for the expected site-frequency spectrum associated with so-called Xi-coalescents, that is exchangeable coalescents which admit simultaneous multiple mergers of ancestral lineages. Xi-coalescents arise, for example, in association with population models of skewed offspring distributions with diploidy, recurrent advantageous mutations, or strong bottlenecks. In contrast, the s...
متن کاملQuadri-allele frequency spectrum in a coalescent topology for mutations in non-constant population size
The sample frequency spectrum of a segregating site is the probability distribution of a sample of alleles from a genetic locus, conditional on observing the sample to have more than one clearly different phenotypes. We present a model for analyzing quadri-allele frequency spectrum, where the ancestral population diverged into three populations at a certain divergence time and the resulting mut...
متن کاملNucleotide variation and balancing selection at the Ckma gene in Atlantic cod: analysis with multiple merger coalescent models
High-fecundity organisms, such as Atlantic cod, can withstand substantial natural selection and the entailing genetic load of replacing alleles at a number of loci due to their excess reproductive capacity. High-fecundity organisms may reproduce by sweepstakes leading to highly skewed heavy-tailed offspring distribution. Under such reproduction the Kingman coalescent of binary mergers breaks do...
متن کامل